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1.  Introduction 


This  study  was  conducted  in  support  of  a  situational  understanding  Army  technology  objective 
(ATO).  The  objective  of  this  ATO  is  to  “develop,  demonstrate,  and  transition  unit  of  action 
Soldier  information  system  interface  guidelines  that  facilitate  Soldiers  gaining  situational 
understanding  and  enable  planning  and  acting  within  the  adversary’s  decision  cycle.” 

The  strategy  selected  to  achieve  this  objective  involved  the  development  of  models  of  Soldier- 
operator  functions  and  tasks  via  the  Improved  Performance  Research  Integration  Tool 
(IMPRINT).  The  IMPRINT  models  helped  to  identify  the  frequency  at  which  workload  levels 
exceeded  a  specified  threshold,  the  tasks  that  contributed  most  often  to  these  workload  peaks, 
and  the  mental  resources  for  which  concurrent  tasks  competed.  Interviews  were  conducted  with 
experienced  vehicle  commanders  and  gunners  to  discuss  suspected  problem  areas  and  potential 
solutions  that  might  reduce  workload  and  enhance  situational  awareness  (SA).  Questionnaires 
were  also  administered  to  derive  a  prioritized  list  of  critical  information  requirements  (CIRs). 
Candidate  solutions  related  to  the  design  of  the  information  display  were  identified  based  on 
principles  of  attention  management  (Wickens  &  Hollands,  2000).  This  study  is  one  of  a  series  of 
investigations  designed  to  evaluate  these  potential  solutions  and  a  first  step  in  iterative  analyses 
(i.e.,  modeling  and  experimentation),  which  will  be  performed  to  assess  the  effects  of  these 
interventions  on  SA  and  decision  cycle  time. 

The  rationale  for  the  present  study  was  based  on  the  results  of  an  IMPRINT  model  and  analyses 
of  the  functions  and  tasks  of  the  two-person  crew  (commander-gunner  and  driver)  and  squad 
leader  in  the  infantry  carrier  vehicle  (ICV)  (Mitchell,  Samms,  Glumm,  Krausman,  Brelsford,  & 
Garrett,  2004).  Multiple  runs  of  the  model  consistently  indicated  that  the  commander-gunner 
frequently  experienced  instances  of  high  workload.  The  greatest  of  these  workload  peaks  were 
caused  by  conflicts  between  tactical  communications  and  other  tasks  that  might  be  shared  by 
crew  members  in  three-  and  four-person  systems.  Conflicting  tasks  included  those  associated 
with  maintaining  an  awareness  and  understanding  of  the  situation  inside  and  outside  the  vehicle, 
such  as  the  task  of  scanning  for  threats  via  the  periscope  or  the  battlefield  display.  According  to 
the  Soldiers  surveyed,  information  about  the  strength,  activity,  and  location  of  dismounted 
enemy  infantry  will  be  most  critical  to  the  commander-gunner  of  the  ICV,  as  it  is  to  the 
commander  and  the  gunner  of  the  Bradley  fighting  vehicle  (BFV)  (Mitchell  et  ah,  2004).  In  a 
line-of-sight  environment,  the  closer  the  enemy  is,  the  more  critical  the  information. 
Communications  and  scanning  tasks  are  the  primary  means  for  obtaining  this  information. 

Digital  communications,  which  employ  the  same  resources  used  to  perform  the  scanning  task 
(i.e.,  visual,  motor,  and  cognitive),  were  found  to  overload  the  commander-gunner  more  often 
than  voice  communications  that  do  not  rely  on  visual  resources.  Although  voice  messages  would 
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be  the  most  likely  mode  of  communication  during  engagements  and  other  periods  of  high 
activity,  digital  communications  containing  critical  information  are  also  likely  to  be  exchanged. 

Discussions  with  Soldiers  confirmed  that  communications  often  conflict  with  target  acquisition 
and  engagement  tasks  in  the  BFV.  The  Soldiers  claimed  that,  as  in  the  BFV,  when  the 
commander-gunner  of  the  ICV  becomes  overloaded,  he  will  most  likely  ignore  those  tasks  he 
considers  less  important  and  will  focus  on  the  task  of  highest  priority.  Some  Soldiers  added  that 
during  engagements,  they  turn  off  their  radio.  In  such  instances,  infonnation  critical  to  a  present 
or  subsequent  engagement  might  be  lost.  Providing  infonnation  about  the  presence  and  location 
of  threats  in  a  manner  that  will  facilitate  target  acquisition  rather  than  compete  for  mental 
resources  is  expected  to  enhance  SA  and  reduce  decision  cycle  time. 

Directional  cues  on  target  location  might  be  provided  auditorily,  visually,  or  tactilely.  Auditory 
cues  can  be  presented  verbally  in  spatial  language  (e.g.,  “5  o’clock”)  or  in  3-D  audio  sounds  that 
appear  to  emanate  from  the  clock  position  of  the  target.  However,  some  of  the  Soldiers 
interviewed  expressed  concern  that  auditory  cues  might  be  lost  amid  the  din  and  frequent  verbal 
exchanges  that  are  typical  of  a  combat  vehicle  environment.  The  Soldiers  were  also  skeptical 
about  the  effectiveness  of  tactile  cues  in  conveying  position  information  while  they  are  being 
tossed  about  their  moving  vehicle.  Some  Soldiers  preferred  visual  cues  that  could  be  integrated 
into  the  sight  picture  in  such  a  manner  as  to  avoid  distraction  and  obstruction  of  the  scene  being 
scanned. 

Auditory  cues  have  been  found  to  be  useful  in  supplementing  visual  information  or  alerting  the 
listener  to  critical  infonnation  within  a  visual  display  (Shinn-Cunningham,  Lehnert,  Kramer, 
Wenzel,  &  Durlach,  1997),  but  the  use  of  auditory  cues  in  providing  spatial  information  about 
target  location  in  ground  combat  vehicles  has  not  been  adequately  explored.  For  the  most  part, 
research  on  spatialized  audio  has  focused  on  its  use  in  aircraft  where  3-D  audio  displays  have 
scored  a  number  of  successes.  Studies  have  shown  that  listeners  who  must  monitor  multiple 
radio  communications  can  selectively  attend  to  one  message  at  a  time  if  messages  are  presented 
in  different  spatial  locations.  In  these  studies,  dismounted  Soldiers  (Haas,  dePontbriand,  Mello, 
Patton,  &  Solounias,  2000)  and  helicopter  pilots  (Haas,  Gainer,  Wightman,  Couch,  &  Shilling, 
1997)  were  found  to  identify  and  respond  to  multi-channel  radio  communications  more  quickly 
and  accurately  with  3-D  audio  than  with  existing  monaural  displays.  In  the  latter  study,  pilots 
scored  fewer  points  on  a  radio  communications- identification  task  when  some  speech  messages 
were  presented  to  one  ear  and  other  messages  presented  to  the  opposite  ear  (i.e.,  dichotic 
presentation).  Even  fewer  points  were  scored  when  all  speech  messages  were  sent  to  both  ears 
(i.e.,  diotic  presentation). 

Spatial  infonnation  about  target  location  has  been  found  to  have  positive  effects  on  target 
acquisition  performance  and  perceptions  of  workload  (Begault,  1993;  McKinley,  Erickson,  & 
D’Angelo,  1994;  McKinley  et  ah,  1995).  In  one  study,  commercial  airline  crew  members 
acquired  targets  faster  using  a  3-D  audio  display  than  did  crew  members  using  a  one-earpiece 


2 


headset;  however,  no  significant  differences  were  found  between  these  auditory  displays  in  the 
number  of  targets  acquired  (Begault,  1993).  In  another  investigation,  3-D  audio  cues  alone  did 
not  improve  target  localization,  but  when  paired  with  visual  cues,  the  3-D  cues  resulted  in 
improvements  in  time  and  accuracy,  reduced  head  movement,  and  lower  subjective  ratings  of 
workload  (Tannen,  2001). 

Many  studies  that  have  compared  the  effects  of  auditory  (e.g.,  3-D  audio  and  spatial  language) 
and  visual  cues  in  the  localization  of  targets  during  navigation  have  focused  primarily  on 
differences  in  spatial  updating  (i.e.,  the  ability  of  people  to  keep  track  of  the  location  of  a  target 
mentally  without  concurrent  perceptual  cues).  In  one  such  study,  Loomis,  Klatsky,  Philbeck,  and 
Golledge  (1998)  found  that  distance  perception  was  more  accurate  with  visual  cues  than  with 
auditory  cues,  but  spatial  updating  was  performed  well  in  both  modalities.  In  another  study  with 
blind  and  blindfolded  sighted  observers,  Loomis,  Lippa,  Klatzky,  and  Golledge  (2002)  again 
found  greater  error  in  distance  perception  with  3-D  audio  than  with  spatial  language  (e.g., 

“5  o’clock,  10  meters”)  for  the  latter  participant  group.  However,  directional  errors  were  greater 
in  the  spatial  language  condition  than  in  the  3-D  audio  mode.  Here,  too,  spatial  updating 
performance  was  nearly  the  same  for  both  auditory  conditions.  The  researchers  concluded  that 
once  a  target  location  is  encoded  or  represented  internally,  the  representation  can  be  updated  as 
well  with  either  modality.  In  their  report  about  the  results  of  this  research,  Loomis  and  her 
associates  present  a  two-process  model  of  the  task  of  navigating  to  a  target  using  the  two 
auditory  modes.  The  two  processes  they  identify  are  stimulus  encoding  and  spatial  update. 
According  to  the  researchers,  encoding  of  3-D  audio  sound  involves  two  substages:  perception 
of  the  spatial  location  of  the  source  and  then  the  creation  of  a  spatial  image  of  the  source 
location.  Encoding  of  a  spatial  language  stimulus  may  or  may  not  require  more  than  one 
substage,  depending  on  whether  a  spatial  image  is  formed  in  the  process  of  converting  the  verbal 
directions  into  meaning.  However,  regardless  of  whether  the  location  of  the  target  is  cued  with 
3-D  audio  sound  or  a  verbal  description,  the  result  of  encoding  these  stimuli  is  a  spatial  image 
that  continues  to  exist  after  the  stimulus  is  no  longer  present. 

The  objective  of  the  present  investigation  was  to  measure  and  compare  the  effects  of  auditory 
(speech  and  non-speech)  and  visual  cues  about  target  location  on  target  acquisition  performance 
and  attention  to  auditory  communications.  During  the  study,  the  participants  performed  target 
acquisition  tasks  while  monitoring  radio  communications  in  each  of  five  cue  conditions:  (1) 
Baseline  1,  (2)  Baseline  2,  (3)  Visual,  (4)  Spatial  Language,  and  (5)  3-D  Audio.  Baseline  1 
represented  current  limitations  in  targeting  information.  In  this  condition,  the  participant  was  not 
provided  any  information  about  the  presence  or  location  of  targets.  In  Baseline  2,  the  participant 
was  provided  an  auditory  alert  (bell)  when  a  target  was  presented  but  he  did  not  receive  any 
information  about  where  the  target  was  located.  In  the  Visual  mode,  target  location  cues  were 
provided  by  an  icon  that  resembled  a  one-handed  clock  without  numbers.  In  the  Spatial 
Language  mode,  cues  were  presented  verbally  in  clock  positions  (e.g.,  “Target  ...3  o’clock”). 
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In  the  3-D  Audio  mode,  target  location  was  cued  by  two  broadband  audio  tones  that  appeared  to 
emanate  from  the  position  of  the  target. 

For  the  current  study,  it  was  hypothesized  that  target  detection  times  would  be  significantly  faster 
in  modes  in  which  information  about  target  location  was  provided.  These  improvements  in  target 
acquisition  performance,  however,  were  expected  to  be  greatest  in  the  Visual  mode  for  two 
reasons.  First,  the  process  of  converting  the  3-D  audio  and  spatial  language  stimuli  into  spatial 
images  may  involve  more  substages  than  the  visual  stimulus.  Second,  auditory  resources  used  in 
listening  to  radio  communications  would  not  compete  for  the  visual  resources  used  in  the 
perception  of  the  visual  cues.  Thus,  for  the  same  reason,  it  was  also  expected  that  the  participants 
would  be  able  to  attend  better  to  radio  communications  in  the  Visual  mode.  The  verbal  cues 
provided  in  Spatial  Language  were  expected  to  be  more  disruptive  to  this  latter  task  than  the  3-D 
Audio  cues  that  were  conveyed  in  sounds  and  not  words. 


2.  Objectives 


The  objective  of  this  study  was  to  measure  and  compare  the  effects  of  auditory  (speech  and  non¬ 
speech)  and  visual  cues  about  target  location  on  target  acquisition  performance  and  attention  to 
auditory  communications. 


3.  Method 


3.1  Participants 

The  participants  were  20  male  Soldiers  who  ranged  in  age  from  22  to  41  years  (mean  =  30.6  years; 
standard  deviation  [SD]  =  6.5  years)  with  from  1.9  to  22.2  years  of  time  in  service  (mean  = 

11.1  years;  SD  =  6.2  years)  and  a  similar  amount  of  time  in  their  military  occupational  specialty 
(MOS).  Most  of  the  participants  were  commanders  or  gunners  of  the  BFV  or  the  Ml  tank  with  an 
MOS  of  19D  or  19K,  respectively.  Fifteen  of  the  20  participants  had  seen  combat  during 
Operation  Desert  Stonn  and/or  Operation  Iraqi  Freedom. 

All  the  participants  passed  tests  of  color  vision  and  met  visual  acuity  requirements  of  20/20  in 
one  eye  and  20/30  in  the  other  eye,  corrected  or  uncorrected.  The  hearing  threshold  levels  (HTL) 
of  the  participants  corresponded  to  Army  physical  profile  H2  which  specifies  an  average  HTL  of 
no  more  than  30  dB,  no  individual  HTL  greater  than  35  dB  at  500,  1000,  and  2000  Hz,  and  no 
HTL  greater  than  55  dB  at  4000  Hz  (U.S.  Army,  1991).  The  participants  had  otoscopically 
nonnal  ears  (i.e.,  no  blockage  or  infection),  and  no  history  or  otologic  pathology  (i.e.,  hearing 
problems)  as  reported  by  the  participant. 
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The  voluntary,  fully  informed  consent  of  the  persons  used  in  this  research  was  obtained  as 
required  by  32  Code  of  Federal  Regulations  219  and  Army  Regulation  (AR)  70-25.  The 
investigators  have  adhered  to  the  policies  for  the  protection  of  human  subjects  as  prescribed  in 
AR  70-25. 

3.2  Apparatus 

3.2.1  Control  Station  and  Target  Scenario 

The  participant’s  control  station  consisted  of  17-inch1  Trinitron  monitor  manufactured  by  Dell 
and  a  joystick  manufactured  by  Saitek  (Cyborg  3-D  Rumble  Force  Stick).  The  participant  was 
seated  approximately  25  inches  from  the  monitor  (i.e.,  seat  reference  point  to  screen).  The 
monitor  presented  a  10-degree  horizontal  field  of  view  (FOV)1  of  the  360-degree  field  around  an 
imaginary  vehicle  in  which  the  participant  was  operating.  The  joystick  controlled  the  movement 
of  the  scene  behind  crosshairs  that  were  fixed  in  the  center  of  the  visual  display.  The  participant 
scanned  the  terrain  around  the  vehicle  by  twisting  the  joystick  to  the  left  or  the  right.  The  farther 
the  hand  control  was  twisted,  the  faster  the  movement  of  the  target  scene.  Movement  of  the 
scene  behind  the  crosshairs  was  limited  to  the  horizontal  plane.  Each  target  was  an  individual 
dismounted  Soldier  who  was  presumed  to  be  an  enemy  (see  figure  1).  This  choice  of  target  type 
was  based  on  a  prioritized  list  of  CIRs  and  related  threats  identified  by  Soldiers.  All  personnel 
targets  were  situated  at  a  distance  of  75  meters  from  the  participant’s  vehicle.  The  targets  were 
equal  in  size  and  presented  along  the  vertical  centerline  of  the  visual  display.  A  hit  on  the  target 
was  recorded  when  the  trigger  on  the  joystick  was  pulled  while  the  crosshairs  were  on  any 
portion  of  the  target.  The  target  fell  to  the  ground  to  indicate  to  the  participant  that  a  hit  had 
been  scored.  The  DiGuy  Scenario2  (Version  5.2.3)  developed  by  Boston  Dynamics  was  used  in 
the  development  of  the  target  scenarios,  the  presentation  of  the  target  location  cues,  the 
interpretation  of  the  input  from  the  joystick,  and  data  collection. 

3.2.2  Target  Location  Cues 

Cues  about  the  location  of  targets  were  provided  in  the  visual,  spatial  language  (speech),  and  3-D 
audio  (non-speech)  modes.  All  cues  were  2.5  seconds  in  duration  and  their  presentation  was 
controlled  by  a  computer.  Target  location  cues  were  presented  once,  relative  to  the  12-o’clock 
position  at  each  target  presentation.  The  following  paragraphs  describe  these  cues  and  the 
apparatus  that  was  used  to  present  them. 


'This  was  based  on  discussions  with  United  Defense  Limited  Partnership  (UDLP)  who  is  responsible  for  the 
design  of  the  crew  station  in  the  ICV.  UDLP  provided  information  about  the  FOV  of  the  commander’s  independent 
viewer  (daylight  TV  sensor)  in  the  BFV  A3  (i.e.,  wide  FOV  [WFOV]:  10  degrees  x  7.5  degrees)  and  best  guess 
about  the  FOV  of  the  sight  in  the  ICV  (WFOV:  9  degrees  circular).  At  the  time  of  this  study,  there  had  not  been  any 
decision  regarding  the  FOV  of  the  commander-gunner’s  sight  in  the  ICV  or  the  size  of  the  flat  panel  on  which  the 
sight  image  would  be  displayed.  A  best  guess  was  that  the  sight  image  would  be  presented  on  one  of  the  main  15- 
inch  square  flat  panel  displays  and,  if  desired,  on  a  smaller  square  flat  panel  called  the  Crewman’s  Remote  Interface 
System  (CRIS). 

“DiGuy  Scenario  is  a  trademark  of  Boston  Dynamics. 
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Figure  1.  Dismounted  enemy  soldier  within  sight  image. 

(1)  Visual.  The  visual  cues  about  target  location  were  provided  by  an  icon  that  resembled  a 
one-handed  clock  without  numbers  (see  figure  2).  The  direction  in  which  the  hand  on  the  clock 
was  pointing  indicated  the  location  of  the  target  within  the  360-degree  field  about  the  vehicle 
platform.  The  directions  were  incremented  in  hours  in  the  same  manner  as  those  cues  presented  in 
3-D  audio  (non-speech)  and  spatial  language  (speech).  Ten  clock  positions  were  used.  No  targets 
were  presented  at  the  12-  or  the  6-o’clock  positions,  partly  because  of  front-back  reversals  that  can 
occur  when  these  cues  are  presented  in  the  3-D  audio  mode  (Begault,  1991).  The  visual  icon  was 
2.5  x  2.5  inches  in  size  and  centered  at  the  bottom  of  the  scene  displayed  on  the  17-inch  monitor. 


Figure  2.  Icon  providing 

directional  cue  in 
the  visual  modality. 

(2)  Spatial  Language  (speech).  These  cues  about  target  location  were  verbal  and  were 
presented  in  a  clock-type  format.  An  example  of  this  type  of  cue  is  “Target! . .  .5  o’clock.” 

As  with  the  3-D  audio  cue,  the  total  duration  of  each  cue  in  the  spatial  language  mode  was 
2.5  seconds.  The  first  and  second  parts  of  the  cue  were  each  approximately  1.0  second  in 
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duration,  separated  by  0.5-second  pause.  All  directional  cues  were  pre-recorded  in  a  female 
voice  in  contrast  to  the  radio  communications  that  were  presented  in  a  male  voice. 

(3)  3-D  Audio  (non-speech).  These  cues  about  target  location  were  broadband  non¬ 
speech  audio  signals.  Each  audio  signal  consisted  of  two  1 -second  tones  that  were  presented 
approximately  0.5  second  apart  for  a  total  duration  of  2.5  seconds.  The  tones  were  spatialized 
with  a  3-D  virtual  audio  localization  (3-DVALS)  system  manufactured  by  Veridian  Engineering 
and  played  on  a  computer  with  a  generic  head-related  transfer  function  recorded  on  a  Kelso 
electronic  mannequin  for  auditory  research  (KEMAR)  acoustic  head. 

All  cues  about  the  presence  and  location  of  targets  and  other  auditory  communications  were 
presented  to  both  ears  of  the  participant  through  stereo  earphones  manufactured  by  Sony.  All 
were  normalized  to  2 1  dB  and  noise  reduction  was  applied.  The  intensity  levels  of  the  auditory 
cues  about  target  location,  as  measured  through  an  artificial  ear,  were  73  peak  decibels  for  the 
Spatial  Language  cue  and  78  peak  decibels  for  the  3-D  audio  cue.  The  decibel  peaks  for  the 
auditory  communications  ranged  from  65  to  70. 

3.2.3  Radio  Communications  and  Questionnaires 

During  each  target  run  in  each  cue  condition,  the  participant  was  presented  communications 
through  his  headphones  that  simulated  tactical  information  transmitted  from  command 
headquarters.  These  communications  were  in  the  form  of  a  situation  report  (SITREP),  although 
much  longer  and  more  detailed  than  normal.  An  example  of  this  SITREP  is  provided  in 
appendix  A.  The  pre-recorded  SITREP  was  the  same  duration  as  the  target  run  (i.e.,  2.5  minutes) 
and  contained  27  different  facts.  Ten  of  the  27  facts  were  changed  at  each  presentation  of  the 
SITREP.  A  total  of  25  SITREPs  was  prepared  and  pre-recorded:  20  SITREPs  for  testing  (i.e., 
four  target  runs  in  each  of  the  five  experimental  conditions)  and  five  SITREPs  for  training  (i.e., 
two  target  runs  in  each  condition).  After  the  completion  of  each  target  run,  the  participant  was 
asked  to  complete  a  questionnaire  that  consisted  of  ten  questions  pertaining  to  the  information 
contained  in  the  SITREP.  The  answers  to  each  question  were  written  and  required  a  one-  or  two- 
word  response.  Each  answer  was  worth  two  points.  If  the  participant  did  not  provide  an  answer  to 
a  question  or  the  answer  was  wrong,  the  participant  scored  zero  points.  If  the  participant  omitted  a 
word  from  an  answer  that  required  two  words  or  if  one  of  the  words  in  the  answer  was  wrong,  the 
participant  scored  one  point.  The  participant  had  a  maximum  of  3  minutes  to  answer  the  ten 
questions.  The  SITREPs  and  associated  questionnaires  presented  after  each  run  were  counter¬ 
balanced  among  the  five  experimental  conditions.  An  example  of  the  SITREP  is  provided  in 
appendix  B. 

3.2.4  The  National  Aeronautics  and  Space  Administration  (NASA)  Task  Load  Index  (TLX) 

The  NASA-TLX  was  used  to  assess  the  participant’s  experience  of  workload  (Hart  &  Staveland, 
1988).  This  technique  uses  rating  scales  to  assess  mental,  physical,  and  temporal  demands, 
performance,  effort,  and  frustration.  Initially,  each  of  these  six  workload  factors  is  assigned  a 
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weight,  based  on  the  responses  of  the  participant  to  pairwise  comparisons.  In  these  comparisons, 
the  six  factors  are  presented  in  15  possible  pairs,  and  for  each  pair,  the  participant  is  asked  to 
circle  the  factor  that  s/he  perceived  contributed  most  to  his  or  her  workload  experience.  The 
participant  then  completes  rating  scales  that  provide  a  measure  of  the  magnitude  of  the  workload 
for  each  factor.  Those  factors  perceived  by  the  participant  to  have  contributed  most  to  his  or  her 
workload  experience  are  given  more  weight  in  the  computation  of  an  overall  workload  score. 

The  paired  comparisons  worksheets  and  the  workload  rating  scale  are  provided  in  appendix  C. 


4.  Procedures 


4.1  Experimental  Design 

The  study  was  a  repeated  measures,  fixed  factor  design.  Five  cue  conditions  were  evaluated. 
These  conditions  and  the  independent  variables  in  this  study  were  (1)  Baseline  1  (2)  Baseline  2 
(3)  Visual  (4)  Spatial  Language,  and  (5)  3-D  Audio.  Cues  about  target  location  were  provided  in 
the  Visual,  Spatial  Language,  and  3-D  Audio  modes  but  not  in  the  two  baseline  conditions. 
Baseline  1  represented  current  limitations  in  targeting  information  where  the  participant 
continuously  scanned  the  terrain  around  the  vehicle  for  threats  without  knowing  whether  threats 
were  present  or  where  they  might  be  located.  In  Baseline  2,  the  participant  was  provided  an 
auditory  alert  (bell)  when  a  target  was  presented,  but  he  did  not  receive  any  information  about 
the  location  of  the  target.  The  primary  purpose  of  the  second  baseline  condition  was  to 
determine  if  the  mere  knowledge  of  the  presence  of  a  target  would  affect  perfonnance  of  the 
target  acquisition  task  and  perceptions  of  workload.  In  all  conditions  except  for  Baseline  1 ,  the 
participant’s  crosshairs  automatically  returned  to  the  12-o’clock  position  after  each  target 
presentation.  The  participant  could  not  move  his  crosshairs  from  that  position  until  cued  about 
the  presence  or  location  of  another  target. 

The  primary  task  of  the  participants  was  to  find  and  engage  targets  as  quickly  as  possible.  Their 
secondary  task  was  to  attend  to  tactical  information  contained  in  SITREPs  that  were  presented 
auditorily  throughout  each  target  run.  The  dependent  variables  in  this  study  included  measures 
of  primary  and  secondary  task  performance  and  subjective  ratings  of  workload.  In  the  target 
acquisition  task,  the  dependent  variables  were  time  to  first  shot,  the  degrees  off  target  center  at 
first  shot,  and  the  percentage  of  hits.  For  those  targets  hit,  time  to  hit  and  the  degrees  off  target 
center  at  hit  were  also  recorded.  The  time  to  first  shot  and  the  time  to  hit  were  calculated  from 
the  time  at  which  the  target  was  presented  to  the  time  of  trigger  pull.  The  dependent  variable  in 
the  secondary  task  was  the  total  number  of  points  scored  on  the  SITREP  questionnaires  that  were 
administered  after  each  target  run  in  each  condition.  Overall  workload  scores  were  derived  with 
the  NASA-TLX. 
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One  participant  was  trained  and  tested  at  a  time.  The  duration  of  training  and  testing  for  each 
participant  was  approximately  4.0  hours.  The  procedures  that  were  followed  for  each  participant 
are  described  in  section  4.2. 

4.2  Training 

Each  volunteer  was  briefed  about  the  purpose  of  the  investigation,  the  procedures  to  be  followed 
during  the  study,  and  any  risks  involved  in  participation.  The  investigator  read  a  volunteer 
agreement  affidavit  aloud  to  the  participant  who  followed  along.  If  the  participant  agreed  to 
participate  in  the  investigation,  he  completed  the  information  requested  in  the  consent  form  and 
signed  it. 

Each  participant  then  completed  a  demographic  questionnaire  to  obtain  pertinent  background 
information.  A  vision  tester  manufactured  by  Titmus  Optical  Company,  Inc.  was  used  to  assess 
the  participant’s  vision  at  near  and  far  distances  to  ensure  that  the  participant  met  visual  acuity 
requirements  of  20/20  in  one  eye  and  20/30  in  the  other  eye,  corrected  or  uncorrected.  The 
participant  was  also  required  to  pass  a  test  for  color  vision.  A  hearing  test  was  administered  by 
an  audiologist  with  an  AC40  clinical  audiometer  manufactured  by  Interacoustics  A/S.  The 
participants  were  required  to  have  an  HTL  corresponding  to  Army  physical  profile  H2  (U.S. 
Army,  1991)  or  better,  otoscopically  nonnal  ears,  and  no  history  of  otologic  pathology. 

The  participant  received  training  in  all  tasks  to  be  performed  during  the  study  in  each  of  the  five 
experimental  conditions,  including  instruction  in  rating  his  workload  experience  using  the 
NASA-TLX.  Training  also  included  practice  in  localizing  the  3-D  audio  cues  presented  at  the 
ten  clock  positions.  During  this  portion  of  the  training,  the  investigator  presented  3-D  audio 
tones  at  each  of  ten  clock  positions,  starting  at  the  1 -o’clock  position  and  ending  at  1 1  o’clock. 
After  each  tone,  the  investigator  stated  the  clock  position  at  which  the  tone  was  presented.  This 
process  was  repeated  two  more  times.  The  investigator  then  presented  the  3-D  audio  tones  at 
each  clock  position  in  a  randomized  order  to  the  participant  who  identified  the  clock  position  of 
each  tone.  This  process  was  repeated  two  additional  times. 

During  training  in  each  of  the  five  experimental  conditions,  the  participant  was  reminded  that  his 
primary  task  was  to  find  and  engage  all  targets  as  quickly  as  possible.  If  the  target  did  not  fall  at 
trigger  pull,  the  target  had  not  been  hit  and  the  participant  was  required  to  re-engage.  The 
participant  was  told  that  his  secondary  task  was  to  attend  to  the  information  contained  in  the 
SITREP.  He  was  infonned  that  some  of  the  details  in  the  SITREP  would  change  and  that  he 
should  not  rely  on  his  memory  of  information  contained  in  previous  SITREPs. 

Training  in  each  condition  included  the  completion  of  two  target  runs,  each  followed  by  the 
questionnaire  that  assessed  the  participant’s  knowledge  of  the  information  contained  in  the 
SITREP.  At  the  conclusion  of  training,  the  participant  received  practice  in  rating  his  workload 
experience  using  the  NASA-TLX.  The  order  in  which  the  participant  received  training  in  each 
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condition  was  counterbalanced  and  presented  in  the  same  order  in  which  the  conditions  would  be 
presented  to  him  during  the  testing  period. 

4.3  Testing 

After  a  15-minute  rest  break,  the  participant  completed  four  runs  in  each  of  the  five  experimental 
conditions.  Each  run  consisted  of  five  targets  for  a  total  of  20  target  presentations.  Each  of  the 
20  targets  was  presented  twice  at  each  of  the  ten  clock  positions  in  a  random  order.  Five 
different  random  orders  of  target  locations  were  developed  and  the  presentation  of  these  orders 
was  counterbalanced  across  the  five  experimental  conditions. 

All  targets  were  individual  dismounted  Soldiers  and  all  represented  enemy  personnel.  Each 
target  was  presented  for  a  maximum  time  of  15  seconds.  If  a  hit  was  scored,  the  target  fell  to  the 
ground  to  indicate  to  the  participant  that  the  target  had  been  successfully  engaged.  The  target 
disappeared  from  the  screen  after  the  1 5  seconds  had  elapsed,  regardless  of  whether  a  hit  was 
scored.  The  time  at  which  the  first  target  was  presented  at  the  start  of  a  target  run  and  the  time 
between  subsequent  target  presentations  was  varied  to  reduce  expectancy.  The  time  intervals 
between  target  presentations  were  5,  10,  15,  20,  and  25  seconds  and  the  order  in  which  these 
intervals  occurred  was  counterbalanced.  The  start  of  a  time  interval  between  target  presentations 
began  15  seconds  after  the  preceding  target  had  been  presented,  regardless  whether  or  when  the 
participant  had  scored  a  hit  on  that  target.  Each  target  run  was  therefore  2.5  minutes  in  duration. 

Each  target  run  was  followed  by  the  questionnaire  that  assessed  the  participant’s  knowledge  of 
information  contained  in  the  radio  communications.  Immediately  after  testing  in  each  condition, 
the  participant  was  asked  to  rate  his  workload  experience  using  the  NASA-TLX. 

At  the  conclusion  of  testing  in  all  five  conditions,  the  participant  was  asked  to  provide  his 
opinions  and  preferences  with  regard  to  the  conditions  evaluated. 


5.  Results 


5.1  Target  Acquisition 

It  had  been  hypothesized  that  target  acquisition  times  in  the  Visual,  Spatial  Language,  and  3-D 
Audio  modes  would  be  significantly  faster  than  target  acquisition  times  in  the  two  baseline 
conditions  (Baseline  1  and  Baseline  2)  where  no  information  was  provided  about  target  location. 
These  improvements  in  target  acquisition  perfonnance  were  expected  to  be  greatest  in  the  Visual 
mode  for  two  reasons.  First,  the  process  of  converting  the  Spatial  Language  and  3-D  Audio  cues 
into  spatial  images  about  target  location  may  involve  more  substages  than  the  visual  stimulus, 
resulting  in  an  increase  in  response  time  and  thus  time  to  acquire  the  target.  Second,  auditory 
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resources  used  in  attending  to  radio  communications  are  less  likely  to  compete  for  visual 
resources  used  in  capturing  visual  cues  about  target  location. 

To  test  this  hypothesis,  separate  linear  mixed  effects  model  analyses  were  performed  on  time  to 
first  shot  and  time  to  hit.  Each  analysis  included  the  order  of  presentation  of  the  cue  conditions. 
The  mean  times  to  first  shot  and  mean  times  to  target  hit  for  each  of  the  five  experimental 
conditions  are  shown  in  figures  3  and  4,  respectively.  The  similarity  between  time  to  first  shot 
and  time  to  hit  within  each  condition  merely  suggests  that,  if  found,  most  targets  were  hit  on  the 
first  shot. 

The  results  of  the  analysis  on  time  to  first  shot  indicated  a  significant  main  effect  of  mode  (F 
(4,  472)  =  55.596 ,p  <.001).  The  analysis  on  time  to  target  hit  also  revealed  differences  between 
cue  conditions  (F  (4,  472)  =  55.837,/?  <.001). 


Figure  3.  Mean  time  to  first  shot. 


Figure  4.  Mean  time  to  target  hit. 
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The  results  of  post  hoc  analyses  using  the  least  significant  difference  (LSD)  method  are  provided 
for  time  to  first  shot  and  time  to  hit  in  tables  1  and  2,  respectively.  These  analyses  revealed  that, 
as  hypothesized,  time  to  first  shot  and  time  to  target  hit  were  significantly  faster  in  the  Visual, 
Spatial  Language,  and  3-D  Audio  modes  than  in  either  of  the  two  baseline  conditions.  Time  to 
first  shot  and  time  to  target  hit  were  slower  in  the  Spatial  Language  mode  than  in  the  Visual  and 
3-D  Audio  conditions,  but  no  significant  differences  were  found  between  the  Visual  and  the  3-D 
Audio  modes.  The  analyses  also  suggested  that  time  to  first  shot  and  time  to  hit  were 
significantly  faster  in  the  Baseline  1  condition  than  in  Baseline  2. 


Table  1.  Mean  difference  between  modes  in  time  to  first  shot. 


Mode 

Baseline  2 

Visual 

Spatial  Language 

3-D  Audio 

Baseline  1 

-0.857  {p  <  .001) 

1.766  (p  <  .001) 

0.741  (p  =  .001) 

1.712  (p  <  .001) 

Baseline  2 

2.623  (p  <  .001) 

1.598  (p  <  .001) 

2.569  (p  <  .001) 

Visual 

-1.025  (p  <  .001) 

-0.054  (p  =  .800) 

Spatial  Language 

0.971  (p  <  .001) 

Bold  blocks  indicate  significant  differences. 


Table  2.  Mean  difference  between  modes  in  time  to  hit. 


Mode 

Baseline  2 

Visual 

Spatial  Language 

3-D  Audio 

Baseline  1 

-0.793  (p  <  .001) 

1.854  (p  <  .001) 

0.819  (p  <  .001) 

1.756  (p  <  .001) 

Baseline  2 

2.647  (p  <  .001) 

1.612  (p  <  .001) 

2.548  (p  <  .001) 

Visual 

-1.035  (p  <  .001) 

-0.098  (p  =  .647) 

Spatial  Language 

0.936  (p  <  .001) 

Bold  blocks  indicate  significant  differences. 


Figure  5  shows  the  mean  time  to  first  shot  and  the  mean  time  to  hit  based  on  the  distance  (clock 
position)  of  the  target  from  the  12-o’clock  position  across  the  five  conditions.  Generally,  the 
farther  the  target  was  from  the  12-o’clock  position,  the  farther  the  slewing  distance  and  thus  the 
longer  the  time  to  acquire  targets. 
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Figure  5.  Time  to  first  shot  and  time  to  target  hit  by  target  location. 
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The  mean  degrees  off  target  center  at  first  shot  and  the  mean  degrees  off  target  center  at  hit  for 
each  of  the  five  experimental  conditions  are  shown  in  figures  6  and  7,  respectively.  The  results 
of  the  linear  mixed  effects  model  analyses  that  were  performed  on  these  data  did  not  reveal 
significant  differences  between  cue  conditions  for  either  the  mean  degrees  off  target  center  at 
first  shot  (F  (4,  472)  =  2.139,  p  =.075),  or  for  the  mean  degrees  off  target  center  at  hit 
(4,  72)  =  .924,/?  =.445). 
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Figure  6.  Mean  degrees  off  target  center  at  first  shot. 
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Figure  7.  Mean  degrees  off  target  center  at  target  hit. 

The  percentage  of  hits  achieved  in  each  of  the  five  cue  conditions  is  shown  in  figure  8.  The 
results  of  the  linear  mixed  effects  model  analysis  indicated  a  significant  difference  between 
modes  on  this  measure  of  target  acquisition  performance  (F  (4,  72)  =  5.220,/?  =.001).  Post  hoc 
analyses  indicated  that  the  percentage  of  hits  achieved  in  the  Visual,  Spatial  Language,  and  3-D 
Audio  modes  (100%)  was  greater  than  that  achieved  in  Baseline  1  (94%)  and  Baseline  2  (95%). 
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No  significant  differences  were  found  between  Baseline  1  and  the  Baseline  2  conditions  or 
between  the  Visual,  Spatial  Language,  and  3-D  Audio  modes. 


Figure  8.  Percentage  of  target  hits. 

5.2  Secondary  Task  Performance  (scores  on  SITREP  questionnaire) 

It  had  been  hypothesized  that  the  participants  would  achieve  higher  scores  on  questionnaires  about 
information  contained  in  the  SITREP  in  the  Visual  mode  because  the  visual  cues  about  target 
location  would  not  compete  for  auditory  resources  used  when  the  Soldiers  attended  to  the  radio 
communications.  The  verbal  cues  provided  in  the  Spatial  Language  mode  were  expected  to  be  more 
disruptive  to  the  secondary  task  than  3-D  Audio  cues  that  were  conveyed  in  sounds  and  not  words. 
To  test  this  hypothesis,  a  linear  mixed  effects  model  analysis  was  perfonned  on  the  total  number  of 
points  scored  on  the  SITREP  questionnaires  in  each  of  the  five  experimental  conditions.  The  results 
of  the  analysis  revealed  a  significant  difference  between  cue  conditions  (F  (4,  72)  =  3.467,  p  =.012). 
Post  hoc  analyses  indicated  that  significantly  less  information  was  recalled  from  the  SITREPs  in 
Baseline  1  than  in  the  other  four  cue  conditions  (see  table  3).  No  differences  were  found  between 
Baseline  2  and  the  Visual,  Spatial  Language,  or  3-D  Audio  modes. 


Figure  9.  Mean  scores  on  SITREP  questionnaire. 
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Table  3.  Mean  difference  between  modes  in  scores  on  SITREP  questionnaires. 


Mode 

Baseline  2 

Visual 

Spatial  Language 

3-D  Audio 

Baseline  1 

-6.60  ( p  =  .008) 

-6.55  (p  =  .008) 

-7.60  ( p  =  .002) 

-7.25  (p  =  .004) 

Baseline  2 

0.05  {p  =  .983) 

-1.00  O  =  .678) 

-0.65  (p  =  .787) 

Visual 

-1.05  (p  =  .663) 

-0.70  (p  =  .771) 

Spatial  Language 

0.35  (p  =  .885) 

Bold  blocks  indicate  significant  differences. 


5.3  Subjective  Workload 

The  results  of  a  linear  mixed  effects  model  analysis  on  subjective  ratings  of  workload  on  each  of 
the  six  workload  dimensions  of  the  NASA-TLX  (i.e.,  mental,  physical,  and  temporal  demand, 
performance,  effort,  and  frustration)  did  not  reveal  a  significant  difference  between  any  of  the  live 
cue  conditions.  However,  significant  differences  were  found  between  modes  on  overall  workload 
scores  computed  from  the  weighted  ratings  on  the  six  workload  dimensions  (F  (4,  72)  =  3.036, 
p  =.023).  The  mean  overall  workload  score  for  each  condition  is  shown  in  figure  10.  The  results 
of  post  hoc  analyses,  shown  in  table  4,  revealed  that  overall  workload  scores  were  significantly 
lower  in  the  3-D  Audio  mode  than  in  all  other  conditions  except  the  Visual  mode. 


Figure  10.  Overall  workload  scores. 


Table  4.  Mean  difference  between  modes  in  overall  workload  scores. 


Mode 

Baseline  2 

Visual 

Spatial  Language 

3-D  Audio 

Baseline  1 

1.61  (p  =  .459) 

5.11  (p  =  .021) 

1.83  (p  =  .400) 

6.41  (p  =  .004) 

Baseline  2 

3.50  (p  =  .110) 

0.22  (p  =  .919) 

4.80  {p  =  .030) 

Visual 

-3.28  (p  =  .134) 

1.30  CP  =  .550) 

Spatial  Language 

4.58  (p  =  .038) 

Bold  blocks  indicate  significant  differences. 
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5.4  Participants’  Preferences  and  Comments 

When  asked  about  which  condition  they  thought  was  best  for  acquiring  targets  while  they 
attended  to  radio  communications,  more  than  half  of  the  participants  (53%)  selected  the  Visual 
mode.  The  preferences  of  the  remaining  participants  were  split  among  the  3-D  Audio  (18%), 
Spatial  Language  (12%),  Baseline  1  (12%),  and  the  Baseline  2  (5%)  conditions.  Most  of  those 
who  preferred  the  Visual  mode  believed  that  the  visual  cues  were  more  straightforward,  required 
less  thought  in  determining  target  location,  and  did  not  interfere  with  the  radio  communications. 
The  reasons  some  gave  for  preferring  the  Spatial  Language  mode  were  that  it  “gets  your 
attention”  or  that  “it  is  what  I’m  used  to.”  The  latter  reason  was  also  given  by  one  participant 
who  preferred  the  Baseline  1  condition. 

When  asked  about  which  condition  they  thought  to  be  the  worst  for  acquiring  targets  while  they 
attended  to  the  radio  communications,  half  of  the  participants  (50%)  selected  the  Baseline  1 
condition.  Eighteen  percent  of  the  participants  believed  the  Spatial  Language  mode  was  the 
worst,  and  another  18%  chose  Baseline  2.  Six  percent  of  the  participants  thought  the  Visual 
mode  was  the  worst,  and  another  6%  chose  the  3-D  Audio  mode.  Many  of  those  who  thought 
the  Baseline  1  condition  was  the  worst  noted  the  lack  of  information  about  target  location  and  the 
need  to  scan  continuously  for  targets.  Most  who  disliked  the  Baseline  2  condition  also  noted  the 
lack  of  information  about  target  location.  Some  considered  the  auditory  alert  to  be  “useless,” 
claiming  that  they  found  themselves  merely  waiting  for  the  auditory  alert  and  not  focusing  on  the 
radio  transmissions.  Others  claimed  that  there  were  times  when  they  were  slow  in  responding  to 
the  alert.  Interference  with  the  radio  transmissions  and  distraction  was  the  reason  given  most 
often  for  their  disliking  the  Spatial  Language  and  3-D  Audio  cues.  Participants  also  expressed  a 
lack  of  confidence  in  their  ability  to  localize  the  3-D  Audio  sounds  to  specific  clock  positions, 
although  all  but  one  participant  performed  well  in  pre-tests.  One  participant  who  believed  the 
Visual  mode  was  the  worst  claimed  that  determining  the  location  of  the  target  “took  too  many 
steps”  and  required  him  to  “look  and  process  before  taking  action.” 

Many  participants  believed  that  cues  about  target  location  should  be  provided  visually  and 
auditorily,  not  only  for  backup  in  case  a  cue  should  fail  but  also  when  a  visual  display  is 
unavailable,  as  would  be  the  case  when  the  commander-gunner  is  seated  with  his  head  outside 
the  vehicle.  When  the  commander-gunner  is  outside  the  vehicle,  the  participants  believed  that 
cues  about  the  location  of  targets  should  be  provided  relative  to  the  orientation  of  his  head. 

When  the  commander-gunner  is  seated  inside  the  vehicle,  visual  cues  about  target  location 
should  be  based  on  the  azimuth  orientation  of  the  main  gun.  A  number  of  participants  preferred 
that  the  auditory  cues  be  provided  in  3-D  Audio  rather  than  in  Spatial  Language,  but  they 
expressed  concern  that  two  speakers  were  needed  to  provide  the  3-D  Audio  cues.  They  claimed 
that  the  speakers  in  their  headsets  tend  to  fail  and  replacements  are  not  readily  available. 
According  to  the  Soldiers,  it  is  not  uncommon  for  crew  members  to  have  a  headset  with  only  one 
speaker  that  works.  Speakers  are  often  swapped  between  crew  members  to  ensure  that  each  has 
at  least  one  functional  speaker. 
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6.  Discussion 


In  this  study,  participants  hit  100%  of  the  targets  presented  in  the  Visual,  Spatial  Language,  and 
3-D  Audio  modes  by  comparison  to  94%  in  Baseline  1  and  95%  in  Baseline  2.  As  hypothesized, 
target  acquisition  times  were  faster  in  modes  where  infonnation  about  target  location  was 
provided  than  in  either  of  the  two  baseline  conditions.  On  the  average,  time  to  first  shot  in  the 
Visual,  Spatial  Language,  and  3-D  Audio  modes  was  1.4  seconds  faster  (19%)  than  in  Baseline  1 
and  2.3  seconds  faster  (27%)  than  in  Baseline  2.  No  difference  was  found  between  the  Visual 
and  3-D  Audio  modes  in  time  to  first  shot  or  time  to  hit,  but  target  acquisition  times  were 
1 .0  second  faster  (15%)  in  these  modes  than  in  the  Spatial  Language  condition.  Similarly,  no 
differences  were  found  between  the  3-D  Audio  and  Visual  modes  in  overall  workload  scores,  but 
scores  in  the  3-D  Audio  mode  were  lower  than  scores  in  all  other  cue  conditions.  On  the 
average,  23%  less  information  was  recalled  from  the  SITREPs  in  Baseline  1  than  in  the  other 
four  cue  conditions  where  attention  could  be  directed  to  communications  between  target 
presentations.  No  differences  were  found  between  Baseline  2  and  the  Visual,  Spatial  Language, 
or  3-D  Audio  modes  in  the  performance  of  this  secondary  task. 

Differences  found  between  cue  conditions  in  target  acquisition  performance  might  best  be 
compared  with  the  results  of  an  investigation  by  Simpson  et  al.  (2004).  In  this  latter  study,  the 
researchers  measured  times  to  visually  acquire  targets  in  a  simulated  flight  task  in  four  display 
conditions  that  were  like  those  assessed  in  the  current  investigation.  One  of  the  four  conditions 
provided  no  information  about  the  presence  or  location  of  targets  (no  display),  as  in  Baseline  1 .  In 
a  second  condition,  an  auditory  alert  was  provided  to  signal  the  presence  of  a  target  (as  in 
Baseline  2)  but  was  also  accompanied  by  a  visual  display  that  showed  target  direction  and  relative 
elevation  (Visual  +  Audio  Alert).  In  the  remaining  two  conditions,  the  visual  display  was 
supplemented  by  a  non-spatialized  verbal  cue  (Visual  +  Clock-Coordinate)  or  a  spatialized  audio 
cue  (Visual  +  3-D  Audio).  As  in  the  present  investigation,  Simpson  et  al.  (2004)  found  that  target 
acquisition  times  were  slower  in  the  first  two  conditions  which  were  also  the  only  conditions  in 
which  targets  went  undetected.  However,  pairing  the  auditory  alert  about  the  presence  of  a  target 
with  the  visual  display  showing  target  location  significantly  reduced  target  acquisition  times  over 
the  “no-display”  condition.  By  themselves,  the  auditory  alerts  about  the  presence  of  a  target 
provided  in  the  present  study  were  considered  “useless”  by  some  participants  who  observed  that  on 
a  number  of  occasions,  they  were  slow  in  responding  to  these  alerts. 

On  average,  target  acquisition  times  in  the  Visual  and  3-D  Audio  modes  were  24%  faster  than  in 
Baseline  1  and  31%  faster  than  in  Baseline  2.  By  comparison,  Simpson  et  al.  (2004)  found  an 
average  25%  reduction  in  target  acquisition  time  between  the  Visual  +  3-D  Audio  mode  and  the 
other  conditions  they  studied.  The  1 -second  difference  in  target  acquisition  time  found  between 
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the  Spatial  Language  condition  and  the  3-D  Audio  and  Visual  inodes  in  the  current  investigation 
was  also  similar  to  the  difference  found  between  the  Visual  +  3-D  Audio  mode  and  the  Visual  + 
Clock  Coordinate  condition  in  Simpson  et  al.  (2004).  In  this  latter  investigation,  the  researchers 
did  not  present  any  concurrent  verbal  communications  that  could  potentially  interfere  with  the 
perception  of  the  verbal  cues  about  target  location.  Therefore,  for  the  present  study,  it  is 
believed  that  the  increase  in  target  acquisition  time  in  the  Spatial  Language  condition  might  be 
attributable  to  other  factors.  First,  the  1 -second  delay  in  target  acquisition  time  may  have  been 
influenced  by  the  structure  of  the  verbal  cue.  In  both  studies,  words  that  defined  the  location  of 
the  target  were  presented  in  the  latter  half  of  the  cue,  preceded  by  a  verbal  alert  (e.g.,  “Target  —  9 
o’clock”  or  “Traffic  —  9  o’clock  high...”).  A  1-second  delay  in  the  receipt  of  information  about 
the  location  of  a  target  might  be  expected  to  result  in  a  similar  delay  in  the  time  to  acquire  the 
target.  However,  it  is  also  likely  that  the  encoding  of  the  Spatial  Language  cue  required  an 
additional  cognitive  step  that  involved  the  conversion  of  the  verbal  cue  about  the  clock  position 
of  the  target  into  a  spatial  image. 

In  a  more  recent  study  by  Haas,  Pillalamarri,  Stachowiak,  and  Lattin  (in  press),  target  location 
information  was  also  prefaced  by  a  verbal  alert,  but  the  location  of  the  target  was  presented  in 
plus  and  minus  degrees  rather  than  in  clock  positions  (e.g.,  “Target  —  minus  15  degrees”). 

Unlike  the  results  of  the  present  investigation,  no  differences  were  found  between  the  verbal  and 
3-D  audio  cueing  techniques  in  target  acquisition  time  or  perceived  workload.  It  is  believed  that 
the  spatial  language  cues  employed  by  Haas  et  al.  (in  press)  may  have  provided  a  more 
immediate  indication  of  whether  the  target  lay  to  the  right  or  left  of  0  degrees,  as  did  the  Visual 
and  3-D  Audio  cues  used  in  the  present  study.  Here,  any  additional  time  and  effort  spent  in  the 
transformation  of  the  Spatial  Language  cue  into  meaning  may  not  only  have  contributed  to 
increases  in  target  acquisition  time  but  also  may  have  offset  any  potential  reductions  in  perceived 
workload  over  baseline  conditions. 

In  the  present  investigation,  no  differences  were  found  between  the  Visual  and  3-D  Audio  modes 
in  either  target  acquisition  performance  or  workload.  The  location  of  the  target  with  respect  to 
the  12-o’clock  position  was  more  readily  discerned  in  these  modes  than  in  the  Spatial  Language 
condition.  Given  that  targets  were  easy  to  detect,  even  at  high  slew  rates,  participants  may  not 
have  felt  compelled  to  localize  the  target  to  a  specific  clock  position.  Rather,  they  may  have 
merely  slewed  in  the  direction  indicated. 

Without  reliable  information  about  the  existence  or  location  of  targets,  commander-gunners  must 
spend  more  time  scanning  the  terrain  around  their  vehicle  in  search  of  potential  threats.  The  task 
of  detecting  and  identifying  targets  can  impose  significant  demands  on  cognitive  resources  and 
attention.  Thus,  in  this  study,  it  was  anticipated  that  while  the  participants  were  searching  for 
targets,  less  attention  would  be  available  for  acquiring  information  contained  in  radio  transmis¬ 
sions.  Improvements  in  the  perfonnance  of  this  secondary  task  were  expected  in  conditions  where 
attention  could  be  directed  to  communications  between  target  presentations.  Additionally,  it  had 
been  hypothesized  that  improvements  would  be  greater  in  the  Visual  mode  because  the  visual  cues 
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about  target  location  would  not  compete  for  auditory  resources  used  by  Soldiers  in  attending  to  the 
radio  transmissions.  The  verbal  cues  provided  in  the  Spatial  Language  mode  were  expected  to  be 
more  disruptive  to  this  secondary  task  than  the  3-D  Audio  cues  that  were  conveyed  in  sounds  and 
not  words.  However,  although  many  of  the  participants  complained  that  the  auditory  cues 
interfered  with  the  SITREPs,  no  differences  were  found  between  the  Visual,  Spatial  Language,  or 
3-D  Audio  modes  in  the  perfonnance  of  the  communications  task.  Generally,  the  analysis  of 
secondary  task  performance  appeared  to  suggest  that  the  less  time  Soldiers  spend  scanning  for 
targets,  the  more  time  would  be  available  for  them  to  attend  to  communications  between  target 
engagements. 

It  is  believed  that  if  targets  had  been  less  conspicuous  than  they  were  in  the  present  investigation, 
target  acquisition  times  in  all  conditions  would  have  been  greater  than  those  that  were  found. 
Target  acquisition  times  would  still  be  expected  to  be  faster  in  modes  where  target  location  cues 
are  provided.  However,  the  time  to  acquire  a  target  would  become  increasingly  dependent  on  the 
fidelity  of  the  target  location  due  and  the  extent  to  which  the  cue  enables  the  commander-gunner 
to  narrow  the  focus  of  his  search.  The  more  time  and  resources  Soldiers  spend  in  the  search  for 
targets,  the  less  time  and  fewer  resources  they  will  have  available  to  perform  a  secondary 
communications  task. 


7.  Conclusions  and  Recommendations 


In  this  study,  cues  that  merely  signaled  the  presence  of  a  target  did  not  provide  any  benefit  in  target 
acquisition  performance.  However,  as  might  be  expected,  targets  were  acquired  significantly 
faster  when  cues  were  provided  about  their  location.  Target  acquisition  times  were  faster  in  the 
Visual  and  3-D  Audio  modes  than  they  were  in  the  Spatial  Language  condition,  but  the  advantage 
that  3-D  Audio  cues  provided  in  reductions  in  target  acquisition  perfonnance  and  overall  workload 
were  not  clearly  distinguishable  from  those  provided  by  cues  in  the  Visual  mode.  Less  informa¬ 
tion  was  recalled  from  SITREPs  in  Baseline  1  than  in  the  other  four  cue  conditions  where  attention 
could  be  directed  to  the  communications  task  between  target  presentations.  However,  contrary  to 
expectations,  no  differences  were  found  between  Baseline  2  and  the  Visual,  Spatial  Language,  or 
3-D  Audio  modes  in  the  perfonnance  of  this  secondary  task. 

The  results  of  this  investigation  are  preliminary.  Additional  studies  are  needed  to  explore  the 
advantages  and  disadvantages  of  the  information  presentation  techniques  assessed  in  this  study 
and  other  display  alternatives,  particularly  in  the  noise  and  vibration  conditions  that  are  typical  of 
the  combat  vehicle  environment.  Studies  that  follow  will  include  an  assessment  of  the  effects  of 
tactile  displays  on  target  acquisition  performance  and  attention  to  communications.  The 
modality  in  which  these  communications  are  presented  (i.e.,  auditory  and  visual)  will  be  included 
as  a  factor  in  these  analyses.  In  these  studies,  targets  will  be  more  embedded  in  the  surrounding 
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terrain  to  provide  a  better  indication  of  the  fidelity  of  the  target  location  cues  and  the  effects 
these  cues  have  on  primary  and  secondary  task  perfonnance. 

The  technology  that  will  provide  information  about  target  location  has  not  currently  been 
defined,  although  work  is  under  way  to  demonstrate  such  a  capability.  It  cannot  be  assumed,  as 
it  was  here,  that  such  a  technology  will  detect  100%  of  targets  with  no  false  alanns.  The 
translation  of  information  from  sensors  and  other  intelligence  sources  into  reliable,  high  fidelity 
sensory  cues  about  enemy  position  poses  a  significant  challenge.  The  potential  impact  of  cue 
reliability  on  target  acquisition  perfonnance  and  workload  with  the  use  of  these  and  other  display 
techniques  will  need  to  be  explored. 
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Appendix  A.  Situation  Report  (SITREP)  Example 


Attention!  Some  aspects  of  the  current  situation  have  changed  that  will  affect  your  mission.  The 
following  is  the  latest  from  Command  Headquarters. 

The  friendly  country  of  Dodge  has  been  overthrown  by  insurgent  religious  extremists.  The 
United  States  has  deployed  the  3rd  STRYKER  Brigade  Combat  Team  of  the  2nd  Infantry 
Division.  This  team  will  link  up  with  and  support  the  remaining  elements  of  the  government. 

The  team  is  currently  assembled  3  kilometers  west  of  the  airport  in  the  capital  city  of  Aberdeen. 
The  team  will  conduct  reconnaissance  operations  around  the  airport  prior  to  securing  it  for  the 
entry  of  heavy  U.S.  forces.  Your  platoon  will  proceed  to  Objective  Brown  which  is  500  meters 
east  of  your  current  assembly  point.  Other  STRYKER  platoons  will  proceed  northeast  and 
southeast  to  their  objectives  to  encircle  the  airport.  You  are  to  arrive  at  your  objective  at  0500 
hours  tomorrow  morning.  Potential  threats  on  the  way  to  your  objective  include  an  ambush 
along  Phaseline  Washington.  The  insurgents  have  captured  tanks  from  the  defense  forces  and 
have  deployed  them  around  the  perimeter  of  the  airport.  Your  closest  supporting  unit  is 
dismounted  infantry  located  100  meters  south  of  the  objective.  The  call  sign  of  this  supporting 
unit  is  Charlie  1.  A  friendly  artillery  unit  is  currently  2  kilometers  north  of  the  airport.  The  call 
sign  of  this  unit  is  Zulu  3.  They  will  await  your  signal  to  provide  support  as  needed.  Air  support 
will  be  provided  by  a  squadron  of  Apache  helicopters.  Their  call  sign  is  Eagle  1.  The  drop  off 
point  for  your  squad  is  Peach  Hill  which  is  100  meters  south  of  the  objective.  The  closest  enemy 
unit  to  the  objective  is  armor,  located  200  meters  east  of  the  objective.  This  annor  unit  is  a 
company-size  unit  and  is  currently  re-supplying.  Dismounted  enemy  infantry  are  located 
throughout  the  countryside  and  are  armed  with  RPGs.  They  have  placed  landmines  near  your 
objective  along  Church  Road. 
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Appendix  B.  SITREP  Questionnaire  (Example) 


Participant  #  :  _  Run  : _  Target  Set : _ 

Please  answer  the  following  questions  based  on  the  SITREP  you  heard  during  this  last  target  set. 
Each  answer  is  one  or  two  words.  The  number  in  parentheses  after  the  question  indicates  the 
number  of  words  in  the  answer.  Examples  of  one-word  answers  are  “armor”  or  “NBC.”  Examples 
of  two-word  answers  are  “mechanized  infantry”  or  “Charlie  Company.”  An  answer  which 
requires  two  words  can  also  include  a  number.  Examples  of  two-word  answers  with  a  number  are 
“200  meters”  or  “Charlie  35.”  Each  answer  is  worth  2  points.  If  you  do  not  provide  an  answer  to  a 
question,  or  the  answer  is  wrong,  you  will  lose  2  points.  If  you  omit  a  word  from  an  answer  that 
requires  two  words,  or  if  one  of  the  words  in  your  answer  is  wrong,  you  will  lose  1  point. 

Question  Answer  (Please  PRINT) 

(1)  What  is  the  name  of  the  country  that  has  been 

overthrown?  (1)  _ 

(2)  In  what  direction  is  the  STRYKER  team  from 

the  airport?  (1)  _ 

(3)  How  far  is  your  objective  from  your  current 

assembly  point?  (2)  _ 

(4)  At  what  time  is  your  platoon  to  arrive  at  the 

objective?  (2)  _ 

(5)  What  have  the  insurgents  captured  from  the 

country’s  defense  forces?  (1)  _ 

(6)  What  is  the  call  sign  of  the  friendly  unit  located 

north  of  the  airport?  (2)  _ 

(7)  What  is  the  name  of  the  drop  off  point  for  the 

squad?  (2)  _ 

(8)  How  far  is  the  closest  enemy  unit  from  the 

objective?  (2)  _ 

(9)  In  what  activity  is  the  enemy  unit  closest  to  your 

objective  currently  engaged?  (1)  _ 

(10)  What  is  the  name  of  the  road  near  your  objective 

where  the  enemy  infantry  has  placed  obstacles?  (1)  _ 
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Appendix  C.  NASA-TLX 


RATING  SCALE  DEFINITIONS 
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Endpoints  Descriptions 
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DEMAND 


TEMPORAL 

DEMAND 
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EFFORT 


FRUSTRATION 

LEVEL 


Low/High 


Low/High 


Low/High 


Perfect/Failure 


Low/High 


Low/High 


How  much  mental  and  perceptual 
activity  was  required  (e.g.  thinking, 
deciding,  calculating,  remembering, 
looking,  searching,  etc.)?  Was  the  task 
easy  or  demanding,  simple  or  complex, 
exacting  or  forgiving? 

How  much  physical  activity  was 
required  (e.g.  pushing,  pulling, 
turning,  controlling,  activating,  etc.)?  Was 
the  task  easy  or  demanding,  slow  or  brisk, 
slack  or  strenuous,  restful  or  laborious? 

How  much  time  pressure  did  you 
feel  due  to  the  rate  or  pace  at 
which  the  task  or  task  elements  occurred? 
Was  the  pace  slow  and  leisurely  or  rapid 
and  frantic? 

How  successful  do  you  think  you 
were  in  accomplishing  the  goals  of  the 
task  set  by  the  experimenter  (or  yourself)? 
How  satisfied  were  you  with  your 
perfonnance  in  accomplishing  these 
goals? 

How  hard  did  you  have  to  work 
(mentally  and  physically)  to  accomplish 
your  level  of  perfonnance? 

How  insecure,  discouraged, 
irritated,  stressed  and  annoyed 
versus  secure,  gratified,  content, 
relaxed  and  complacent  did  you 
feel  during  the  task? 
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